Best Action Selection in a Stochastic Environment

نویسندگان

  • Yingce Xia
  • Tao Qin
  • Nenghai Yu
  • Tie-Yan Liu
چکیده

We study the problem of selecting the best action from multiple candidates in a stochastic environment. In such a stochastic setting, when taking an action, a player receives a random reward and affords a random cost, which are drawn from two unknown distributions. We target at selecting the best action, the one with the maximum ratio of the expected reward to the expected cost, after exploring the actions for n rounds. In particular, we study three mechanisms: (i) the uniform exploration mechanism MU; (ii) the successive elimination mechanism MSE; and (iii) the ratio confidence bound exploration mechanismMRCB. We prove that for all the three mechanisms, the probabilities that the best action is not selected (i.e., the error probabilities) can be upper bounded by O(exp{−cn}), where c is a constant related to the mechanisms and coefficients about the actions. We then give an asymptotic lower bound of the error probabilities of the consistent mechanisms for Bernoulli setting, and discuss its relationship with the upper bounds in different aspects. Our proposed mechanisms can be degenerated to cover the cases where only the reward/costs are random. We also test the proposed mechanisms through numerical experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Complexity of Reinforcement Learning

Using the asymptotic equipartition property which holds on empirical sequences we elucidate the explicit performance of exploration, and the fact that the return maximization is characterized by two factors, the stochastic complexity and a quantity depending on the parameters of environment. We also examine the sensitivity of stochastic complexity, which is useful in appropriately tuning the pa...

متن کامل

Multi-period project portfolio selection under risk considerations and stochastic income

This paper deals with multi-period project portfolio selection problem. In this problem, the available budget is invested on the best portfolio of projects in each period such that the net profit is maximized. We also consider more realistic assumptions to cover wider range of applications than those reported in previous studies. A novel mathematical model is presented to solve the problem, con...

متن کامل

A stochastic model for project selection and scheduling problem

Resource limitation in zero time may cause to some profitable projects not to be selected in project selection problem, thus simultaneous project portfolio selection and scheduling problem has received significant attention. In this study, budget, investment costs and earnings are considered to be stochastic. The objectives are maximizing net present values of selected projects and minimizing v...

متن کامل

Application of Stochastic Learning Automata to Intelligent Vehicle Control

A stochastic automaton can perform a finite number of actions in a random environment. When a specific action is performed, the environment responds by producing an environment output that is stochastically related to the action. This response may be favourable or unfavourable. The aim is to design an automaton that can determine the best action guided by past actions and responses. Using Stoch...

متن کامل

Automatic control based on Wasp Behavioral Model and Stochastic Learning Automata

A stochastic automaton can perform a finite number of actions in a random environment. When a specific action is performed, the environment responds by producing an environment output that is stochastically related to the action. The aim is to design an automaton, using a reinforcement scheme based on the computational model of wasp behaviour that can determine the best action guided by past ac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016